3.3 Bayes Estimation for Frequentist

1 Bayes Risk and Bayes Estimator

1.1 Frequentist Motivation

Consider model P={Pθ|θΘ} for data X. Loss L(θ,d), risk R(θ;δ)=Eθ[L(θ,δ(X))].
The Bayes risk is the average-case risk, integrated w.r.t some measure Λ, called prior.
For now, assume Λ(Θ)=1 (probability measure). Later we will allow it to be improper (Λ(Θ)=).

rΛ(δ)=ΘR(θ,δ)dΛ(θ)=EθΛ[R(θ,δ)]=EθΛ[E[L(θ,δ(X))|θ]]=E[L(θ,δ(X))].

(if we assume θΛ,X|θPθ) E is now mean w.r.t the joint distribution of (θ,X).

An estimator δ minimizing rΛ() is called a Bayes estimator. It depends on P,Λ,L (by tower property again, rΛ(δ)=E[E[L(θ,δ(X))|X]].)

1.2 Prior and Posterior

Now we can explicitly define the densities:

Definition

  • Prior λ(θ).
  • Likelihood pθ(x).
  • Joint density λ(θ)pθ(x).
  • Marginal density Θλ(θ)pθ(x)dθ=q(x).
  • Posterior density λ(θ|x)=λ(θ)pθ(x)q(x).

Bayes estimator depends on posterior:

δΛ(x)=argmindE[L(θ,d)|x]=argmindΘL(θ,d)λ(θ|x)dθ.

Solving Bayes estimator should be "one x at a time".

Theorem

Suppose X|θPθ, L(θ,d)0. rΛ(δ0)< for some δ0(X). Then δΛ(x) is Bayes with rΛ(δΛ)< iff δΛ(x)argmindE[L(θ,d)|X=x],a.e.x

2 Posterior Mean

2.1 Square Error Loss

If L(θ,d)=(g(θ)d)2, then the Bayes estimator is the posterior mean: E[(g(θ)d)2|X]=E[(g(θ)E[g(θ)|X]+E[g(θ)|X]d)2|X]=Var(g(θ)|X)+(E[g(θ)|X]d)2, so δΛ(X)=E[g(θ)|X].

2.2 Weighted Square Error

If L(θ,d)=w(θ)(g(θ)d)2 (like (θdθ)2), then E[(dg(θ))2w(θ)|X]=d2E[w(θ)|X]2dE[w(θ)g(θ)|X]+E[w(θ)g(θ)2|X], which is minimized at d=E[w(θ)g(θ)|X]E[w(θ)|X].

2.3 Other Examples

We can see from several examples that posterior mean is weighted sum of sample mean and prior mean. When n is huge, the weight of sample is huge.

If the posterior is from the same family as the prior, we say that prior is conjugate to the likelihood. This is most common in exponential families.

3 Conjugate Priors

Suppose Xi|ηi.i.dpη(x)=eηTT(x)A(η)h(x),ηΞRs, i=1,,n.
For any carrier λ0(η), define (s+1) -dim family λμ,k(η)=ekμTηkA(η)B(μ,k)λ0(η), so sufficient statistic is (ηA(η)), and natural parameter is (kμk).
So λ(η|x1,,xn)η(i=1neηTT(xi)A(η)h(xi))ekμTηkA(η)B(kμ,k)λ0(η)ηe(kμ+i=1nT(xi))Tη(k+n)A(η)λ0(η)=λμpost,n+k(η), where μpost=kμ+nTk+n,T(X)=1ni=1nT(Xi). Then μpost=Tnk+nUMVUE from data+μkk+n“UMVUE” from “pseudo data”.

3.1 Conjugate Prior Examples

Likelihood Prior
XiθBinomial(n,θ) θBeta(α,β)
XiθN(θ,σ2) θN(μ,τ2)
XiθPoisson(θ) θGamma(ν,s)